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1.  GINSU  Overview 

GINSU  is  a  DARPA  Fault  Tolerant  Networking,  Foeused  Researeh  Topie  projeet.  We 
improve  the  integrity  and  robustness  of  the  Linux  host  operating  system  by  isolating  and 
monitoring  traffie  streams  within  the  kernel.  We  allow  an  administrator  to  pre-alloeate 
system  resourees  aeross  multiple  axes,  ineluding  proeess  name,  eonneetion  state,  UNIX 
user  or  group  identity,  and  souree  and/or  destination  endpoint  addresses.  GINSU 
monitors  searee  resourees,  sueh  as  soeket  buffers,  TCP  eontrol  bloeks,  TCP  Ports,  CPU 
time,  ete.,  aeross  these  axes  and  makes  seheduling  deeisions  with  respeet  to  the 
allotments  of  these  resourees  and  their  aetual  use.  GINSU  does  “early  demultiplexing”  of 
network  traffie  in  order  to  determine  the  ultimate  owner  of  network  traffie.  GINSU 
ensures  that  sehedulable  entities  (network  streams,  proeesses,  and  protoeols)  are  isolated 
from  eaeh  other  aeross  all  resouree  boundaries  and  guarantees  that  malieious  or 
unantieipated  levels  of  network  traffie  eannot  eompromise  operating  system  and  serviee 
integrity. 

2.  GINSU  Architecture 

GINSU  presently  supports  the  Linux  open-souree  operating  system,  speeifieally,  any 
distribution  based  on  late  versions  of  the  2.4-series  kernel.  On  Linux,  GINSU  is 
implemented  as  a  eolleetion  of  loadable  kernel  modules.  When  GINSU  modules  are  not 
loaded  we  do  not  impaet  normal  Linux  kernel  funetions  at  all.  We  set  up  GINSU-speeffie 
state  and  bring  all  managed  network  resourees  under  the  GINSU  regime  onee  our 
modules  are  loaded  by  the  system  administrator  or  by  automatie  boot-time  initialization 
seripts. 

Below  we  present  a  high  level  view  of  the  GINSU  system  arehiteeture. 
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Figure  1:  The  High-level  GINSU  System  Architecture 

The  wide  arrows  in  the  above  diagram  represent  the  flow  of  network  packets  through  a 
GINSU-enhanced  kernel.  The  diagram  is  ordered  from  bottom  to  top  from  lowest-  (i.e. 
closest  to  hardware)  level  up  through  successive  layers  of  abstraction  to  high-  (i.e. 
application)  level  network  processing  functions. 

GINSU  builds  upon  the  best  ideas  from  Scout/Escort  (the  path  abstraction),  SILK  (open 
source  host  integration),  Exokernel  (logical  stack  encapsulation  and  dynamic  packet 
classification),  and  NAI  Labs'  AMP  Channel  Stack  (traffic  isolation)  and  combines  and 
extends  them  into  a  portable,  maintainable,  and  manageable  host  package.  We  embrace  a 
kernel-space  approach  rather  than  a  user-space  approach,  as  research  results  from  the 
latter  have  tended  to  exhibit  unacceptable  performance  and  generally  require  kernel-space 
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modifications  anyway.  The  cornerstone  of  the  GINSU  design  is  the  concept  of  traffic 
slices.  A  slice  subdivides  network  traffic  into  demultiplexed  "streams.”  These  slices  are 
independently  scheduled  and  monitored.  One  or  more  catch-all  slices  are  allocated  for 
unauthenticated  traffic.  At  the  heart  of  GINSU  is  the  ability  to  track  per-traffic-slice 
resource  consumption  in  order  to  provide  fairness  between  authenticated  and 
unauthenticated  traffic  resource  use,  as  well  as  provide  non-interference  between 
individual  traffic  slices.  Slices  may  be  allocated  for  each  endpoint  created  by  an 
application  or  kernel  process  for  receiving  or  transmitting  network  traffic,  or  for  some 
aggregate,  for  example,  a  particular  destination  TCP  port  on  the  host,  or  a  particular 
source  subnet.  The  slice  hierarchy  and  the  assignment  of  traffic  classes  to  slices  is 
determined  by  the  administrator  and  performed  at  run-time  through  the  use  of  command- 
line  utilities  by  the  administrator.  As  traffic  is  partitioned  into  slices,  the  administrator 
may  also  set  limits  on  or  pre-allocate  reservations  of  network-related  kernel  resources. 
These  include:  connection  tracking  structures;  message  buffers;  available  bandwidth;  and 
available  CPU  processing  time. 

At  the  earliest  possible  point  in  per-packet  processing,  GINSU  determines,  based  on 
endpoint  addressing  information  contained  within  the  packet,  whether  the  packet  belongs 
either  to  1)  a  known  and  authenticated  receiver,  2)  an  unauthenticated  receiver  (either 
new  or  unknown),  or  3)  no  receiver  at  all.  This  process  is  known  as  early  demux.  (Note 
that  the  term  “receiver”  has  multiple  connotations  here:  by  “receiver”  we  mean  both  a 
specific  endpoint  and  also  the  traffic  slice  containing  this  endpoint.)  This  analysis  is 
performed  asynchronously  at  interrupt  time  using  an  efficient  tri-based  approach 
pioneered  in  the  Exokernel  and  ported  to  the  Linux  platform  as  part  of  this  research.  Tries 
are  populated  on  socket  (endpoint)  creation  and  address  binding,  as  appropriate,  and 
depopulated  upon  socket  destruction.  Based  on  the  results  of  this  analysis,  GINSU  may 
simply  discard  the  packet  before  any  host  resources  or  kernel  packet  processing  time  is 
consumed  at  all.  Otherwise,  the  packet  is  either  marked  for  traffic  shaping,  or  it  is  queued 
for  subsequent  standard  protocol  processing.  Unless  the  received  packet  is  destined  for  an 
endpoint  created  by  the  currently  executing  process,  it  will  be  deferred  until  that  process 
is  selected  for  execution  by  the  OS  scheduler.  This  feature  is  known  as  lazy  receiver 
processing.  Aside  from  early  demux,  GINSU  may  also  be  distinguished  from  standard 
Linux  here  by  its  use  of  lazy  receiver  processing.  The  combination  of  early  discard  and 
lazy  receiver  processing  implies  that,  under  certain  circumstances,  especially  those  likely 
to  be  encountered  during  an  attempted  denial-of-service  attack,  GINSU  requires  less 
processing  than  unmodified  Linux. 

If  a  packet  is  marked  for  traffic-shaping  then  further  GINSU  processing  takes  place 
immediately.  GINSU  implements  a  modular  framework  for  ingress  traffic-shaping 
modeled  after  existing  Linux  facilities  for  egress  traffic-shaping.  Within  this  framework 
we  modified  the  existing  Linux  hierarchical  token  bucket  (HTB)  egress  packet  scheduler 
for  ingress  operation. 

The  GINSU  software  is  partitioned  into  four  loadable  kernel  modules.  The 
ginsujcommon  module  contains  generic  functions  for  intercepting  kernel  actions  (the 
hook  API),  a  very  useful  and  flexible  hash  table  implementation  (the  map  API),  and  a 
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generic  hierarchical  resource  management  framework  (the  resource  API).  The  ginsujow 
module  implements  all  intelligent  GINSU  functionality,  utilizing  the  hook  and  resource 
APIs  provided  by  ginsu  common.  This  functionality  includes  the  dynamic  packet 
filtering  facility  (or  DPF),  the  ingress  traffic-shaping  framework  (the  ingress-shaping 
API),  a  command-line-based  management  interface,  and  all  slice  and  socket  resource 
tracking  logic.  The  ginsu_sch_htb  module  implements  a  hierarchical  token  bucket  (or 
HTB)  traffic-shaping  algorithm  for  use  with  ginsu  low's  ingress-shaping  API.  Finally, 
the  ginsu _proc  module  provides  a  read-only  view  into  internal  GINSU  state  via  a  file  tree 
in  the  Linux  /proc  filesystem. 


Figure  2:  Module  Stacking  in  the  GINSU  System 

Developers  wishing  to  extend  the  GINSU  framework  to  allow  additional  resource 
accounting  or  to  provide  some  other  enhanced  functionality  should  make  themselves 
familiar  with  the  GINSU  source  code.  Significant  aspects  of  that  source  code  are 
described  below.  Administrators  interested  in  how  the  GINSU  system  operates,  from  a 
high-level  perspective,  may  refer  to  document  TR-XXX  “GINSU  Administrative  Guide” 
and  the  specially  marked  sections  in  this  document. 

2.1  ginsu jcommon 

The  ginsu  common  module  contains  generic  functions  for  intercepting  kernel  actions 
(the  hook  API),  a  very  useful  and  flexible  hash  table  implementation  (the  map  API),  and 
a  generic  hierarchical  resource  management  framework  (the  resource  API). 

2.1.1  Hooks 

GINSU  hooks  provide  additional  generic  interception  points  for  various  actions  above 
and  beyond  those  provided  by  a  standard  Linux  kernel.  Also,  various  hooks  for  internal 
GINSU  actions  are  provided.  Currently  the  following  hook  points  are  defined: 

GINSU  HOOK  SCHEDULER 
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within  the  system  seheduler,  invoked  whenever  a  eontext  switch  occurs 

GINSU_HOOK_TIMER 

within  the  system  clock  (timer)  soft  interrupt  handler,  invoked  once  per 
time-slice  (100  HZ  default  on  the  Intel  IA-32  platform) 

GINSU_HOOK_SYS_PRE_FORK 

invoked  at  the  start  of  the  fork  syscall,  just  prior  to  execution  of  privileged 
(kernel)  code 

GINSU_HOOK_SYS_POST_FORK 

invoked  from  within  the  fork  syscall,  subsequent  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_PRE_ACCEPT 

invoked  at  the  start  of  the  accept  socket  syscall,  just  prior  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_POST_ACCEPT 

invoked  from  within  the  accept  socket  syscall,  subsequent  to  execution  of 
privileged  code 

GINSU_HOOK_SYS__PRE_BIND 

invoked  at  the  start  of  the  bind  socket  syscall,  just  prior  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_POST_BIND 

invoked  from  within  the  bind  socket  syscall,  subsequent  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_PRE_CLOSE 

invoked  at  the  start  of  the  close  syscall,  just  prior  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_POST_CLOSE 

invoked  from  within  the  close  syscall,  subsequent  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_PRE_CONNECT 

invoked  at  the  start  of  the  connect  socket  syscall,  just  prior  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_POST_CONNECT 

invoked  from  within  the  connect  socket  syscall,  subsequent  to  execution 
of  privileged  code 


GINSU  HOOK  SYS  PRE  LISTEN 
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invoked  at  the  start  of  the  listen  soeket  syscall,  just  prior  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_POST_LISTEN 

invoked  from  within  the  listen  socket  syscall,  subsequent  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_PRE_SHUTDOWN 

invoked  at  the  start  of  the  shutdown  socket  syscall,  just  prior  to  execution 
of  privileged  code 

GINSU_HOOK_SYS_POST_SHUTDOWN 

invoked  from  within  the  shutdown  socket  syscall,  subsequent  to  execution 
of  privileged  code 

GINSU_HOOK_SYS_PRE_SOCKET 

invoked  at  the  start  of  the  'socket'  socket  syscall,  just  prior  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_POST_SOCKET 

invoked  from  within  the  'sockef  socket  syscall,  subsequent  to  execution  of 
privileged  code 

GINSU_HOOK_SYS_PRE_SOCKETPAIR 

invoked  at  the  start  of  the  socketpair  socket  syscall,  just  prior  to 
execution  of  privileged  code 

GINSU_HOOK_SYS_POST_SOCKETPAIR 

invoked  from  within  the  socketpair  socket  syscall,  subsequent  to 
execution  of  privileged  code 

GINSU_HOOK_SYS_PRE_EXIT 

invoked  at  the  start  of  the  exit  syscall,  just  prior  to  execution  of  privileged 
code  (control  will  not  return  to  the  application  after  privileged  actions 
complete) 

GINSU_HOOK_SYS_POST_EXIT 

invoked  from  within  the  exit  syscall,  subsequent  to  execution  of  privileged 
code  (control  will  not  return  to  the  application  after  privileged  actions 
complete) 
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GINSU_HOOK_PKT_RX 

invoked  from  the  network  soft  interrupt  whenever  a  paeket  is  received 
from  a  network  device 

GINSU_HOOK_PKT_TX 

invoked  whenever  a  packet  is  presented  to  a  network  device  for  immediate 
transmission 

GINSU_HOOK_SLICE_CREATE 

invoked  by  ginsu_low  whenever  a  GINSU  slice  is  created 

GINSU_HOOK_SLICE_DESTROY 

invoked  by  ginsu_low  whenever  a  GINSU  slice  is  destroyed 

GINSU_HOOK_SLICE_SOCK_ASSOCIATE 

invoked  by  ginsu  low  whenever  a  GINSU  socket  resource  is  associated 
with  a  parent  slice  resource 

GINSU_HOOK_SLICE_SOCK_DISASSOCIATE 

invoked  by  ginsu  low  whenever  a  GINSU  socket  resource  is  disassociated 
from  a  parent  slice  resource 

GINSU_HOOK_SLICE_TASK_ASSOCIATE 

invoked  by  ginsu  low  whenever  a  GINSU  socket  resource  is  associated 
with  a  parent  task  resource 

GINSU_HOOK_DPF_INSERT 

invoked  by  ginsu  low  whenever  a  dynamic  packet  filter  rule  is  about  to  be 
inserted  into  the  system  trie 

GINSU_HOOK_DPF_DELETE 

invoked  by  ginsu  low  whenever  a  DPF  filter  rule  is  about  to  be  removed 
from  the  system  trie 

GINSU_HOOK_GET_PROC_STATS 

invoked  by  ginsu_proc  to  collect  a  human-readable  list  of  global  statistics 
and  indicators  for  presentation  to  the  human  user  via  the  /proc  interface 

A  user  of  the  hook  API  first  registers  one  or  more  functions  to  be  called  at  a  specific  hook 
site.  These  functions  may  be  either  passive,  meaning  they  will  not  seek  to  alter  decisions 
made  by  the  kernel,  or  authoritative,  meaning  they  may  seek  to  alter  decisions  made  by 
the  kernel.  Authoritative  hooks  are  run  before  passive  hooks.  In  the  event  that  an 
authoritative  hook  cancels  the  current  action,  no  further  authoritative  or  any  passive 
hooks  will  be  invoked  and  an  error  condition  will  be  signaled  to  the  kernel.  At  any  point  a 
hook  function  may  be  unregistered. 


int  ginsu_hook_register  (int  where,  int  type,  ginsu_hook_func_t  f ) ; 
int  ginsu_hook_unregister  (int  where,  ginsu_hook_func_t  f) ; 
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Internally,  the  hook  API  provides  functions  for  use  within  GINSU  kernel  modules  for 
driving  hook  invocations. 


int  ginsu_hook_call  (int  where,  . . . )  ; 

The  exact  number  and  type  of  arguments  passed  to  the  hook  function  vary  from  hook  site 
to  hook  site,  so  variadic  functions  are  used. 

2.1.2  Maps 

The  ginsu  common  module  also  provides  a  generic  hash  table  implementation  derived 
from  the  Scout  [?]  operating  system.  Using  this  "map"  interface,  a  GINSU  programmer 
may  uniformly  manage  dynamic  keyed  lookup  of  a  collection  of  data  items  of  arbitrary 
size. 

These  functions  create  and  destroy  map  structures,  respectively: 


ginsu_map_t  ginsu_inap_create  (int  nbuckets,  int  key_size) ; 
void  ginsu_inap_destroy  (ginsu_inap_t  m)  ; 

This  function  returns  the  current  number  of  items  contained  within  a  map: 

int  ginsu_map_count  (ginsu_map_t  m) ; 

This  function  returns  the  size  in  octets  of  fixed-sized  keys  or  -1  for  variable  (string)  keys. 

size_t  ginsu_map  key_size  (ginsu  inap_t  m) ; 

These  functions  manage  the  insertion  and  removal  of  items  with  fixed  size  keys  into/from 
a  map: 


ginsu_map_binding_t  ginsu_map_bind  (ginsu_map_t  m,  const  void  *  key, 

unsigned  long  value) ; 

int  ginsu_inap_unbind  (ginsu_map_t  m,  const  void  *  key) ; 

int  ginsu_inap_remove_binding  (ginsu_map_t  m, 

ginsu_map_binding_t  b) ; 


This  function  performs  a  keyed  lookup  of  an  item  within  a  map: 

int  ginsu  map_resolve  (ginsu  map  t  m,  const  void  *  key,  int  key_size, 
unsigned  long  *  value) ; 


These  functions  allow  the  user  to  enumerate  over  all  items  contained  within  a  map: 


void  ginsu_map_walk_init  (ginsu_map_walk_t,  ginsu_map_t  m) ; 

ginsu_map_el_t  ginsu_map_walk_next  (ginsu_map_walk_t  w) ; 
void  ginsu_map_walk_done  (ginsu_map_walk_t  w) ; 


Finally,  these  functions  manage  insertion  and  removal  of  items  with  variable  sized 
(string)  keys  into/from  a  map: 


ginsu_map_binding_t  ginsu_map_var_bind  (ginsu_map_t  m,  const  void  *  key, 

int  key_size,  unsigned  long  value); 
int  ginsu_inap_var_unbind  (ginsu_map_t  m, 

const  void  *  key,  int  key_size) ; 
int  ginsu_map_var_resolve  (ginsu_map_t  m, 

const  void  *key,  int  key_size, 
unsigned  long  *  value) ; 


2.1.3  Resource  Management 

The  ginsu  common  module  also  exports  the  GINSU  generie  hierarchieal  resouree 
management  framework.  Under  this  regime,  arbitrarily  sized  blobs  of  binary  data 
(presumably  kernel  resources  or  internal  GINSU  state)  may  be  uniformly  managed. 
Though  the  framework  is  agnostic  about  the  internal  structure  of  resources  it  does  require 
that  each  distinct  resource  type  be  assigned  a  unique  integer  selector.  One  or  more 
attributes  may  be  associated  with  a  given  resource.  The  framework  is  also  agnostic 
regarding  the  internal  structure  of  attributes.  However,  for  each  resource  type,  unique 
integer  selectors  must  be  used  when  getting  or  setting  an  attribute.  Relationships  among 
resources  are  maintained  in  a  directed  graph.  Parent-child  relationships  are  represented  as 
bidirectional  edges  within  that  graph.  When  each  resource  is  created,  type-specific 
manage  and  release  methods  are  provided  by  the  caller,  for  initializing  and  for  cleaning 
up  state,  respectively.  The  primary  benefit  of  the  GINSU  resource  management 
framework  is  the  automatic  destruction  (release)  of  child  resources  when  parents  are 
explicitly  destroyed.  This  framework  also  makes  trivial  resource  enumeration  up  from 
children  to  parents,  or  down  from  parents  to  children.  Separately,  per-type  maps  are  used 
to  store  pointers  to  every  instance  of  a  given  resource  type.  Type  maps  are  primarily  used 
for  efficient  enumeration  of  all  instances  of  a  particular  type,  but  because  type  map 
entries  are  indexed  by  unique  type-specific  keys,  a  programmer  may  exploit  these  maps 
for  rapid  location  of  a  particular  resource  instance  without  resorting  to  an  expensive 
traversal  of  the  resource  hierarchy. 
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Figure  3:  The  GINSU  Generic  Resource  Framework 

This  function  creates  a  new  resource  and  assoeiates  it  with  a  unique  key,  a  type-specific 
manage  function,  and  a  type-specific  release  function: 

ginsu  resource_t  ginsu_resource_new  (int  type,  void  *  key, 

int  key_len,  void  *  data,  int  data_len, 
int  (*manage) (ginsu_resource_t) , 
int  (*release) (ginsu_resource_t) ) ; 

This  function  frees  (releases)  a  resource  and  any  ehildren: 

void  ginsu  resource  free  (ginsu_resource_t  r) ; 

This  function  frees  all  resources  of  a  given  type  and  any  of  their  children,  regardless  of 
type: 

void  ginsu  resource  free  all  (int  type) ; 

These  functions  allow  the  user  to  get  or  set  resource  attributes: 

int  ginsu_resource_get_attr  (ginsu_resource_t  r,  int  attr, 
void  **  data,  int  *  data_len) ; 

int  ginsu_resource_set_attr  (ginsu_resource_t  r,  int  attr, 
void  *  data,  int  data_len) ; 

int  ginsu  resource  get_attr_siinple  (ginsu  resource  t  r,  int  attr, 
unsigned  long  *  val); 

int  ginsu  resource  set_attr_simple  (ginsu  resource  t  r,  int  attr, 
unsigned  long  val); 

void  *  ginsu_resource_get_data  (ginsu_resource_t  r,  int  *  data_len) ; 
int  ginsu_resource_set_data  (ginsu_resource_t  r,  void  *  data, 
int  data  len) ; 
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These  funetions  enumerate  the  resouree  hierarchy  starting  at  a  specific  resource  if  'r'  is 
non-NULL,  or  starting  at  the  roots  of  the  specified  type  otherwise.  During  the 
enumeration,  canned  actions  are  taken  if  any  of  the  desired  resource  type  is  found. 


int  ginsu  resource_enum_attr_f ind  first_up  (ginsu  resource_t  r, 
int  which,  int  attr,  ginsu  resource_t  *  result) ; 

int  ginsu  resource_enuin_attr_f ind_f ir st  down  (ginsu  resource_t, 
int  which,  int  attr,  ginsu_resource_t  *  result)  ; 

int  ginsu  resource  enum_attr_simple_min_up  (ginsu  resource_t  r, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu  resource  enum_attr_simple_min_down  (ginsu  resource_t, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu  resource_enuin_attr_simple_max_up  (ginsu  resource_t  r, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu  resource_enum_attr_simple_max_down  (ginsu_resource_t, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu_resource  enum_attr_simple_sum_up  (ginsu_resource_t  r, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu  resource_enuin_attr_simple_sum_down  (ginsu_resource_t, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu  resource_enuin_attr_simple_inc_up  (ginsu  resource_t  r, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu  resource  enum_attr_simple_inc_down  (ginsu  resource_t, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu  resource  enum_attr_simple_dec_up  (ginsu  resource_t  r, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

int  ginsu  resource_enuin_attr_simple_dec_down  (ginsu_resource_t, 
int  which,  int  nr_attrs,  int  *  attrs, 
unsigned  long  *  vals) ; 

These  functions  will  enumerate  the  resource  hierarchy  and  compare  two  attributes:  a 
limit,  and  a  count.  If  the  count  attribute  for  a  resource  is  found  to  exceed  the  limit 
attribute  for  that  same  resource,  a  pointer  to  that  resource  will  be  returned  in  result. 

int  ginsu  resource_enum  attr_simple_test_limit_up  (ginsu  resource_t  r, 
int  which,  int  liinit_attr,  int  count_attr, 
ginsu  resource_t  *  result); 

int  ginsu  resource  enum  attr_simple_test_limit_down  (ginsu  resource  t  r, 
int  which,  int  limit_attr,  int  count_attr, 
ginsu_resource_t  *  result) ; 

This  function  will  allow  the  user  to  individually  enumerate  all  resources  of  a  given  type: 

ginsu  resource_t  ginsu_resource  enumerate  (int  type, 

unsigned  long  *  cookie) ; 

This  function  locates  a  resource  given  a  type  and  a  unique  key: 

ginsu_resource_t  ginsu  resource_f ind  (int  type,  void  *  key,  int  key  len)  ; 
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These  functions  let  the  user  manage  parent-child  relationships  among  resources: 


int  ginsu  resource_parent  (ginsu  resource_t  r,  ginsu  resource_t  parent); 
int  ginsu_resource_reparent  (ginsu  resource_t  r, 

ginsu  resource_t  old  parent,  ginsu  resource  t  new  parent) ; 
int  ginsu  resource  unparent  (ginsu_resource_t  r, ginsu  resource  t  parent); 


2.1.4  Logging 

When  resource  limits  are  exceeded,  or  when  current  system  conditions  require  proactive 
action  to  maintain  a  specified  resource  reservation,  GINSU  will  send  a  detailed  message 
to  the  system  logging  (syslog)  facility.  A  table  describing  the  format  of  these  messages 
and  presenting  an  example  follows: 


timestamp 

module 

facility 

resource 

Message  data 

Jun  30  14:21:26 

GINSU: 

SLICE: 

socket  limit 

Sid=31  now=100  limit=100 

The  following  function  provides  the  programmatic  interface  to  this  facility: 


#define  GINSU_SYSLOG_EMERG  0 
#define  GINSU_SYSLOG_ALERT  1 
#define  GINSU_SYSLOG_CRIT  2 
#define  GINSU_SYSLOG_ERR  3 
#define  GINSU_SYSLOG_WARNING  4 
#define  GINSU_SYSLOG_NOTICE  5 
#define  GINSU_SYSLOG_INFO  6 
#define  GINSU  SYSLOG  DEBUG  7 


int  ginsu_syslog  (int  severity,  cha 
char  *  resource,  char 


system  is  unusable 

action  must  be  taken  immediately 

critical  conditions 

error  conditions 

warning  conditions 

normal,  but  significant  conditions 

informational 

debug  level  messages 

r  *  module,  char  *  facility, 

*  f mt ,  . . . ) ; 


2.2  ginsujow 

The  ginsu  low  module  implements  all  intelligent  GINSU  functionality,  utilizing  the  hook 
and  resource  APIs  provided  by  ginsu  common.  This  functionality  includes  the  dynamic 
packet-filtering  facility  (or  DPF),  the  ingress  traffic-shaping  framework  (the  ingress¬ 
shaping  API),  a  command-line -based  management  interface,  and  all  slice  and  socket 
resource  tracking  logic. 

As  previously  discussed,  GINSU  partitions  network  traffic  into  distinct  slices.  Each  slice 
may  be  individually  associated  with  resource  limits  or  reservations.  A  default  slice 
collects  all  traffic  not  otherwise  directed.  A  hierarchical  token  bucket  traffic-shaping 
scheme  (implemented  in  conjunction  with  the  ginsu  sch  htb  module)  provides  effective 
limits  and  reservations  on  different  classes  of  network  traffic.  Simple  limits  on  message 
buffer  memory,  connection  table  entries,  and  CPU  timeslice  consumption  are  also 
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provided.  Simple  reservations  for  eonneetion  table  entries  and  network  bandwidth  usage 
are  supported.  Internally,  Linux  ereates  a  soeket  for  eaeh  distinet  endpoint.  GINSU 
dynamieally  ereates  soeket  resourees  as  Linux  soekets  are  ereated  or  destroyed  by 
applieations.  Soekets  are  automatieally  assigned  to  slices  according  to  their  source  and/or 
destination  endpoint  addressing  information.  ginsu_low  uses  the  socket  syscall 
interposition  points  provided  by  ginsu  common  to  monitor  application  activity  for  socket 
creation,  endpoint  address  binding,  and  socket  destruction  events. 
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2.2.1  Installing  Resource  Constraints 


An  administrator  may  install  slice  partitioning  rules  and  their  corresponding  reservations 
and  limits  through  extensions  to  the  standard  Linux  IPTables  packet-matching  rule 
language.  IPTables  is  the  Linux  interface  to  the  network  subsystem  for  use  in  firewalling, 
packet  inspeetion,  or  packet  rewriting  applications.  We  have  added  a  eustom  GINSU 
iptables  table  and  LIMIT  and  RESERVE  rule  targets. 


Usage : 


iptables 

iptables 

iptables 

iptables 

iptables 

iptables 

iptables 


-[ADC]  chain  rule-specification  [options] 

-[RI]  chain  rulenum  rule-specification  [options] 
-D  chain  rulenum  [options] 

-[LFZ]  [chain]  [options] 

-[NX]  chain 

-E  old-chain-name  new-chain-name 
-P  chain  target  [options] 


EIMIT  target  options: 


--limit-timeslice  <percent> 

--limit-bandwidth  <bits> 

--limit-bandwidth-octets  <octets> 

--limit-sockets  <count> 

--limit-connections  <count> 

--limit-queue  <count> 

--renice  <priority> 

--euid  <euid> 

--egid  <egid> 

--sid  <sid> 


Limit  total  percentage  of  each 

timeslice  that  may  be  consumed  from 
the  owner  process  for  this  flow. 

Limit  flow  bandwidth  to  <bits> 
bits/second. 

Like  --limit-bandwidth,  but  in  units 
of  octets  instead  of  bits. 

Cap  total  number  of  unique  endpoints 
allowed  for  the  given  flow. 

Cap  total  number  of  connections  for 
the  given  flow  (includes  half-open) 

Limit  the  maximum  number  of  queued 
sk_buffs  for  the  given  flow. 

Renices  (decreases)  base  priority  of 
owner  process  to  the  given  limit. 

Only  match  if  effective  user  ==  euid. 

Only  match  if  effective  group  ==  egid 

Attach  rule  to  slice  with  given  SID. 


RESERVE  target  options: 

--reserve-bandwidth  <bits> 
--reserve-bandwidth-octets  kbytes > 
--reserve-connections  <count> 
--renice  <priority> 

--euid  <euid> 

--egid  <egid> 

--sid  <sid> 


Reserve  flow  bandwidth  of  <bits> 
bits/second. 

Like  --reserve-bandwidth,  but  in  units 
of  octets  instead  of  bits. 

Reserve  a  number  of  connection  slots 
for  the  given  flow. 

Renices  (increases)  base  priority  of 
owner  process  up  to  the  given 
reservation . 

Only  match  if  effective  user  ==  euid. 

Only  match  if  effective  group  ==  egid. 

Attach  rule  to  slice  with  given  SID. 


These  custom  targets  are  in  addition  to  the  traditional  iptables  matches  and  extensions, 
portions  of  which  are  excerpted  below  from  the  iptables  manual: 

PARAMETERS 
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The  following  parameters  make  up  a  rule  speeification  (as  used  in  the  add,  delete,  insert, 
replace  and  append  commands). 

-p,  --protocol  {[^protocol 

The  protocol  of  the  rule  or  of  the  packet  to  check.  The  specified  protocol  can  be  one  of 
tcp,  udp,  icmp,  or  all,  or  it  can  be  a  numeric  value,  representing  one  of  these  protocols  or 
a  different  one.  A  protocol  name  from  /etc/protocols  is  also  allowed.  A  "!"  argument 
before  the  protocol  inverts  the  test.  The  number  zero  is  equivalent  to  all.  Protocol  all  will 
match  with  all  protocols  and  is  taken  as  default  when  this  option  is  omitted. 

-s,  —source  [!]  address[/mask] 

Source  specification.  Address  can  be  either  a  hostname,  a  network  name,  or  a  plain  IP 
address.  The  mask  can  be  either  a  network  mask  or  a  plain  number,  specifying  the 
number  of  I's  at  the  left  side  of  the  network  mask.  Thus,  a  mask  of  24  is  equivalent  to 
255.255.255.0.  A  "!"  argument  before  the  address  specification  inverts  the  sense  of  the 
address.  The  flag  — src  is  a  convenient  alias  for  this  option. 

-d,  —destination  [!]  address[/mask\ 

Destination  specification.  See  the  description  of  the  -s  (source)  flag  for  a  detailed 
description  of  the  syntax.  The  flag  — dst  is  an  alias  for  this  option. 

-i,  -in-interface  [!]  [name] 

Optional  name  of  an  interface  via  which  a  packet  is  received  (for  packets  entering  the 
INPUT,  FORWARD  and  PREROUTING  chains).  When  the  "!"  argument  is  used 
before  the  interface  name,  the  sense  is  inverted.  If  the  interface  name  ends  in  a  "+",  then 
any  interface  which  begins  with  this  name  will  match.  If  this  option  is  omitted,  the  string 
"+"  is  assumed,  which  will  match  with  any  interface  name. 

-0,  -out-interface  [!]  [name] 

Optional  name  of  an  interface  via  which  a  packet  is  going  to  be  sent  (for  packets  entering 
the  FORWARD,  OUTPUT  and  POSTROUTING  chains).  When  the  "!"  argument  is 
used  before  the  interface  name,  the  sense  is  inverted.  If  the  interface  name  ends  in  a  "+", 
then  any  interface  which  begins  with  this  name  will  match.  If  this  option  is  omitted,  the 
string  "+"  is  assumed,  which  will  match  with  any  interface  name. 

MA  TCH-EXTENSIONS 


iptables  can  use  extended  packet-matching  modules.  These  are  loaded  in  two  ways: 
implicitly,  when  -p  or  —protocol  is  specified,  or  with  the  -m  or  —match  options, 
followed  by  the  matching  module  name;  after  these,  various  extra  command  line  options 
become  available,  depending  on  the  specific  module.  You  can  specify  multiple  extended 
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match  modules  in  one  line,  and  you  can  use  the  -h  or  —help  options  after  the  module  has 
been  speeified  to  reeeive  help  speeifie  to  that  module. 

The  following  are  ineluded  in  the  base  paekage,  and  most  of  these  ean  be  preeeded  by  a  ! 
to  invert  the  sense  of  the  mateh. 

Top 

These  extensions  are  loaded  if protoeol  tep'  is  speeified.  It  provides  the  following 
options: 

-source-port  [!]  \port[:portJ] 

Souree  port  or  port  range  speeifieation.  This  ean  either  be  a  serviee  name  or  a  port 
number.  An  inclusive  range  ean  also  be  speeified,  using  the  format  porf.port.  If  the  first 
port  is  omitted,  "0"  is  assumed;  if  the  last  is  omitted,  "65535"  is  assumed.  If  the  seeond 
port  is  greater  than  the  first  they  will  be  swapped.  The  flag  —sport  is  an  alias  for  this 
option. 

-destination-port  [!]  \port[:port]^ 

Destination  port  or  port  range  speeifieation.  The  flag  — dport  is  an  alias  for  this  option, 
-tcp- flags  [!]  mask  comp 

Mateh  when  the  TCP  flags  are  as  speeified.  The  first  argument  is  the  flags  whieh  we 
should  examine,  written  as  a  eomma-separated  list,  and  the  seeond  argument  is  a  comma- 
separated  list  of  flags  whieh  must  be  set.  Flags  are:  SYN  ACK  FIN  RST  URG  PSH 
ALL  NONE.  Henee  the  eommand 

iptables  -A  FORWARD  -p  tep  -tep-flags  SYN,ACK,FIN,RST  SYN 

will  only  match  packets  with  the  SYN  flag  set,  and  the  ACK,  FIN  and  RST  flags  unset. 

[!]  -syn 

Only  match  TCP  packets  with  the  SYN  bit  set  and  the  ACK  and  FIN  bits  eleared.  Such 
packets  are  used  to  request  TCP  connection  initiation;  for  example,  blocking  such  packets 
eoming  in  an  interfaee  will  prevent  ineoming  TCP  eonneetions,  but  outgoing  TCP 
oonneetions  will  be  unaffeeted.  It  is  equivalent  to  — tcp-flags  SYN, RST, ACK  SYN.  If 
the  "!"  flag  precedes  the  "—syn",  the  sense  of  the  option  is  inverted. 

— tcp-option  [!]  number 

Mateh  if  TCP  option  set. 
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Udp 

These  extensions  are  loaded  if '—protocol  udp'  is  specified.  It  provides  the  following 
options: 

-source-port  [!]  \port[:port]^ 

Source  port  or  port  range  specification.  See  the  description  of  the  -source-port  option  of 
the  TCP  extension  for  details. 

-destination-port  [!]  \port[:port]^ 

Destination  port  or  port  range  specification.  See  the  description  of  the  -destination-port 
option  of  the  TCP  extension  for  details. 

For  example,  the  following  rule  partitions  traffic  directed  at  a  local  Web  server  into  a 
distinct  slice  with  a  60  mega-bit-per-second  upper  bound  on  available  network 
bandwidth.  (Leftover  bandwidth  is  then  available  for  urgent  or  administrative  actions.) 

iptables  -t  ginsu  -A  PREROUTING  -p  tcp  --destination-port  http 
-j  LIMIT  --limit-bandwidth  62914560 

The  ‘limit-queue’  limit  option  sets  a  maximum  bound  on  the  length  of  the  slice  queue. 
This  is  not  an  aggregate  limit,  however  -  each  child  slice  gets  the  same  limit.  Setting  the 
queue  limit  to  zero  effectively  discards  matching  traffic  at  DPF  demultiplexing  time, 
which  is  the  most  effective  way  to  shed  unwanted  traffic.  Setting  the  bandwidth  limit  of  a 
slice  to  zero  will  also  drop  traffic,  but  not  as  efficiently.  One  can  install  DPF  filters  for 
obvious  attack  traffic  with  “null  route”  straight  to  early  discard  as  follows: 

iptables  -t  ginsu  -A  PREROUTING  -s  netblock/ mask  — in-interface  (face 
-j  LIMIT  --limit-queue  0 

The  ‘renice’  limit  and  reservation  options  will  dynamically  lower  or  raise,  respectively, 
the  base  system  scheduling  priority  of  the  owner  process  for  the  slice.  “Reserving”  a 
‘renice’  reservation  will  raise  the  process  priority  if  its  priority  after  slice  association  is 
too  low.  Conversely,  setting  a  ‘renice’  limit  will  lower  the  process  priority  if  its  priority 
after  slice  association  is  too  high.  Note  that  the  POSIX  limit  for  scheduling  priority  range 
is  from  -20  (lowest)  to  19  (highest). 

Important  Note:  Slice  partitioning  rules  are  applied  only  when  new  slices  are  created. 
Ideally,  newly  installed  limits  and  reservations  should  be  retroactively  applied  to  existing 
slices.  However,  our  prototype  does  not  do  this.  Accordingly,  a  GINSU  administrator 
should  define  slice/traffic  sorting  rules  early,  before  any  such  traffic  is  processed  by  the 
host;  or,  if  protective  traffic  limits  have  been  installed,  arrange  for  the  effective  service  to 
restart  any  existing  connections. 

When  the  ginsu_low  module  is  first  loaded,  it  waits  for  additional  signals  from  the  user 
before  commencing  full  operation.  The  GINSU  administrator  may  separately  enable  or 
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disable  slice  sorting,  lazy  receiver  processing,  or  ingress  traffic-shaping  by  setting  a 
special  GINSU-specific  socket  option  using  the  standard  UNIX  setsockoptQ  function. 
These  functions  may  be  enabled  or  disabled  at  any  time.  The  following  snippet  of  C  code 
demonstrates  how  this  is  done; 


#define  GINSU_SIOCTL  40 

#define  GINSU_SIOCTL_START  1 

#define  GINSU_SIOCTL_STOP  2 

#define  GINSU_SUBSYS_SLICE  1 

#define  GINSU_SUBSYS_SHAPING  2 

#define  GINSU  SUBSYS  LRP  3 


int  f d; 

struct  {  int  command;  int  what;  }  args; 

/*  get  a  socket  */ 

fd  =  socket (PF_INET,  SOCK_DGRAM,  PF_UNSPEC) ; 

/*  start  slice  sorting  */ 

args. command  =  GINSU_SIOCTL_START ; 

args. what  =  GINSU_SUBSYS_SLICE; 

setsockopt ( fd,  SOL_IP,  GINSU_SIOCTL,  &args) ; 

/*  start  lazy  receiver  processing  */ 

args. command  =  GINSU_SIOCTL_START ; 

args. what  =  GINSU_SUBSYS_LRP; 

setsockopt ( fd,  SOL_IP,  GINSU_SIOCTL,  &args) ; 

/*  stop  lazy  receiver  processing  */ 

args. command  =  GINSU_SIOCTL_STOP; 

args. what  =  GINSU_SUBSYS_LRP; 

setsockopt ( fd,  SOL_IP,  GINSU_SIOCTL,  &args) ; 

/*  when  done,  close  (release)  the  socket  */ 
close (fd) ; 


The  GINSU  source  distribution  includes  simple  utilities  written  in  Perl  for  managing  this 
process  from  the  command  line.  The  ‘up’  and  ‘down’  utilities  can  be  used  once  the 
GINSU  modules  have  been  loaded  as  follows: 


up:  usage:  up  <subsysteml>  [...<subsystemN> ] 

where  <subsystem>  is  one  of  SLICE,  LRP,  or  SHAPING 

down:  usage:  down  <subsysteml>  [...<subsystemN> ] 

where  <subsystem>  is  one  of  SLICE,  LRP,  or  SHAPING 

Note  that  once  slice-sorting  has  commenced,  it  must  be  stopped  before  the  ginsu  low 
module  may  be  unloaded  with  the  Linux  ‘rmmod’  {remove  module)  loadable  kernel 
module  management  utility. 
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2.2.2  Startup  Sequence 


Once  the  ginsu  low  module  is  initialized  it  immediately  begins  intereepting  paekets  as 
they  arrive  from  the  network.  This  is  done  using  a  hook  function  registered  with  the 
Linux  ‘netfilter’  paeket  intereeption  faeility  on  the  NF  IP  PRE  ROUTING  ehain.  From 
within  this  hook  funetion  GINSU  gains  eontrol  upon  the  reeeption  of  a  paeket  and  may 
modify,  steal,  or  do  nothing  with  the  paeket,  before  returning  eontrol  to  the  Linux  kernel. 
Within  GINSU,  eaeh  paeket  is  first  proeessed  by  the  dynamic  packet  filter  (DPF)  faeility, 
whieh  attempts  to  mateh  the  eontents  of  the  paeket  to  a  trie  of  {offset,length,mask,value} 
tuples.  These  tuples  are  installed  automatieally  by  ginsu  low  as  soekets  are  ereated  and 
bound  to  transport  endpoints  by  applieations.  A  DPF  lookup  operation  returns  either  a 
sliee  pointer  or  the  root,  or  default,  sliee  pointer  if  an  explicit  listener  was  not  found.  The 
default  sliee  operates  with  low  priority.  This  arrangement  automatieally  prioritizes 
expeeted  traffie  over  unexpeeted,  potentially  unauthorized  (or  attaek)  traffie. 


2.3  The  Dynamic  Packet  Filter  (DPF)  API 

The  DPF  trie  is  managed  by  the  following  functions: 


void  ginsu  dpf  begin  (struct  dpf_ir  **  ir) ; 
void  ginsu_dpf_end  (struct  dpf_ir  **  ir) ; 

int  ginsu_dpf_insert  (struct  ginsu  sock  *  ss,  struct  dpf_ir  *  ir) ; 
int  ginsu_dpf_delete  (struct  ginsu_sock  *  ss); 
void  ginsu_dpf_printir  (char  *  buf,  struct  dpf_ir  *  ir) ; 
int  ginsu_dpf_atoms  (struct  dpf_ir  *  ir) ; 


DPF  rules  are  eonstrueted  out  of  one  or  more  atoms  speeified  with  these  funetions: 

Filter  ereation  routines,  nbits  eorresponds  to  8,  16,  32  —  depending  on  the 

operation.  msg[byte_offset:  nbits]  means  to  load  nbits  of  the  message  at 

byte_offset. 

Compare  message  value  to  eonstant: 

msg [byte_off set : nbits ]  &  mask  ==  val 

void  ginsu_dpf_meq8  (struct  dpf_ir  *  ir,  u_intl6_t  byte_offset, 
u_int8_t  mask,  u_int8_t  val); 

void  ginsu_dpf_meql 6  (struct  dpf_ir  *  ir,  u_intl6_t  byte_offset, 
u_intl6_t  mask,  u_intl6_t  val); 

void  ginsu_dpf_meq32  (struct  dpf_ir  *  ir,  u_intl6_t  byte_offset, 
u_int32_t  mask,  u_int32_t  val); 

Compare  message  value  to  eonstant: 

msg [byte_off set : nbits ]  &  mask  !=  val 

void  ginsu_dpf_not_meq8  (struct  dpf_ir  *  ir,  u_intl6_t  byte_offset, 
u_int8_t  mask,  u_int8_t  val); 

void  ginsu_dpf_not_meql 6  (struct  dpf_ir  *  ir,  u_intl6_t  byte_offset, 
u  intl6  t  mask,  u  intl6  t  val); 
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void  ginsu_dpf_not_meq32  (struct  dpf_ir  *  ir,  u_intl6_t  byte_offset, 
u_int32_t  mask,  u_int32_t  val); 

Shift  the  base  message  pointer: 

msg  +=  (msg [byte_of f set : nbits ]  &  mask)  «  shift; 

void  ginsu_dpf_mshift8  (struct  dpf_ir  *  ir,  u_intl6_t  offset, 
u_int8_t  mask,  u_int8_t  shift) ; 

void  ginsu_dpf_mshiftl6  (struct  dpf_ir  *  ir,  u_intl6_t  offset, 
u_intl6_t  mask,  u_int8_t  shift) ; 

void  ginsu_dpf_mshift32  (struct  dpf_ir  *  ir,  u_intl6_t  offset, 
u_int32_t  mask,  u_int8_t  shift) ; 

Shift  the  base  message  pointer  by  a  eonstant: 

msg  +=  nbytes . 

void  ginsu_dpf_shif ti  (struct  dpf_ir  *  ir,  u_intl6_t  nbytes); 


The  GfNSU  DPF  faeility  is  a  port  of  the  dynamie  paeket- filtering  subsystem  of  the  MIT 
Exokernel.  Aecordingly  it  has  its  strengths  -  simplieity  and  effieieney  -  and  its 
weaknesses  -  lack  of  support  for  packet  headers  of  variable  size,  or  for  fragmented  IP 
packets.  We  have  partially  implemented  a  countermeasure  for  the  latter  shortcoming:  a 
special  GINSU  kernel  process  that  will  consume  fragmented  IP  packets,  reassemble 
them,  and  then  either  retry  the  DPF  process  once  all  fragments  have  been  received,  or 
discard  partially  reassembled  packets  if  not  all  fragments  are  received  within  a  few 
seconds.  However,  this  code  remains  untested  and  probably  will  not  function  correctly 
without  minor  bug  fixes.  Regarding  the  former  shortcoming,  the  AMP  project  at  Network 
Associates  Laboratories  encountered  this  limitation  and  addressed  it  by  augmenting  DPF 
with  a  special  operator  that  would  shift  the  base  message  pointer  based  on  special 
knowledge  of  the  IPv4  and  TCP  header  structures.  In  this  way,  filters  could  be  specified 
as  if  the  variable  portions  of  those  headers  simply  did  not  exist.  A  refined  GINSU 
prototype  could  implement  similar  functionality. 


2.4  Queue  Management 


Once  traffic  is  sorted  it  is  then  categorized  for  traffic-shaping  according  to  any  bandwidth 
limits  or  reservations  associated  with  the  matching  slice.  Traffic-shaping  is  performed  by 
a  generic  framework  that  can  support  multiple  arbitrary  queuing  disciplines.  Currently, 
we  provide  a  hierarchical  token  bucket  queuing  discipline,  implemented  in  the  separate 
ginsu  sch  htb  module,  and  a  simple  rate-limiting  scheme  implemented  within  ginsu  low 
that  does  not  depend  on  any  external  module.  Under  normal  operation,  the  ginsu  sch  htb 
module  is  automatically  loaded  and  configured  when  bandwidth  limits  or  reservations  are 
set  via  the  GINSU  iptables  interface. 

Ingress  packet-queuing  disciplines  are  managed  with  these  functions: 

int  ginsu_qdisc_register  (struct  ginsu_qdisc_ops  *); 

int  ginsu  qdisc  unregister  (struct  ginsu  qdisc  ops  *); 
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Shared  rate  tables  are  managed  with  these  functions: 


struct  ginsu_qdisc_rtab  *  ginsu_qdisc_get_rtab  ( 

struct  ginsu_qdisc_ratespec  *, 
u_int32_t  * ) ; 

void  ginsu_qdisc_put_rtab  (struct  ginsu_qdisc_rtab  *); 

Internally,  queuing  disciplines  and  their  associated  traffic  classes  are  instantiated, 
destroyed,  and  modified  with  these  functions: 


struct  ginsu_qdisc  *  ginsu_qdisc_create  (char  *,  u_int32_t, 

unsigned  long  * ) ; 

struct  ginsu  qdisc  *  ginsu  qdisc_find  (struct  net_device  *, 

u_int32_t) ; 

int  ginsu_qdisc_destroy  (struct  ginsu_qdisc  *); 

int  ginsu_qdisc_graft  (struct  net_device  *,  struct  ginsu_qdisc  *, 

u_int32_t,  struct  ginsu_qdisc  *,  struct  ginsu_qdisc  **); 
int  ginsu_qdisc  change_class  (struct  ginsu  qdisc  *,  u_int32_t, 
u_int32_t,  unsigned  long  *) ; 


In  cooperation  with  the  ginsu  sch  htb  module,  ginsu  low  provides  hierarchical  token 
bucket  traffic-shaping  capabilities  by  default.  Information  on  the  theory  of  operation  of 
HTB  is  outside  the  scope  of  this  document.  More  information  may  be  found  at  the  Linux 
HTB  home  page,  at  http://luxik.cdi.cz/~devik/qos/htb/  . 

The  ginsu  low  module  is  the  primary  resource  monitor.  Internally  it  uses  three  major 
structures  to  account  for  network  stack  and  host  OS  resource  usage:  ginsu_task, 
ginsu_slice,  and  ginsu_sock;  objects. 


2.5  Slice  Scheduling 

One  ginsu_task  object  is  maintained  for  every  OS  task  (also  known  as  a  process).  Using 
this  object,  GINSU  manages  a  run  queue  for  slices  with  pending  work:  recall  the  earlier 
discussion  regarding  lazy  receiver  processing.  LRP  defers  packet-processing  work  when 
packets  are  received  from  the  networks  that  are  not  destined  for  the  currently  executing 
process.  Such  packets  are  queued  in  the  incoming  queue  of  their  target  slice.  This  slice,  in 
turn,  is  flagged  as  runnable  and  placed  on  the  run  queue  of  the  task  that  owns  the  slice.  At 
every  context  switch,  GINSU  gains  control  from  within  the  system  scheduler  logic  just 
prior  to  invocation  of  user-level  code.  {In  order  to  accommodate  this  control  transfer,  the 
stock  Linux  kernel  scheduler  must  be  modified  with  a  small  patch  included  in  the  GINSU 
source  distribution.  The  kernel  must  then  be  recompiled  and  reinstalled.)  Here,  pending 
slices  are  removed  from  the  run  queue  within  the  corresponding  ginsu_task  object  and 
their  incoming  packet  queues  are  serviced.  In  order  to  prevent  excessive  network  level 
work  from  consuming  all  of  the  time  slice  for  newly  running  process,  further  LRP 
processing  is  deferred  until  the  next  available  time  slice  if  LRP  processing  consumes 
more  than  60%  of  the  current  time  slice.  (This  corresponds  to  a  period  of  six  milliseconds 
on  unmodified  Linux  kernels  for  the  Intel  IA-32  architecture.)  This  is  done  to  insure  that 
the  user  level  application  code  has  a  reasonable  amount  of  CPU  time  in  order  to  make 
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progress  given  the  new  input  from  the  network.  Currently,  this  limit  is  hard-eoded,  but 
may  be  manually  adjusted  and  put  into  effeet  by  reeompiling  and  reinstalling  the  GINSU 
modules. 

Every  ginsu_task  objeet  also  eontains  a  referenee  to  the  root  slice  for  the  proeess.  In 
order  to  simplify  the  sliee  hierarehy,  every  task  is  alloeated  a  root  shoe,  whieh  is 
thereafter  the  initial  and  default  owner  for  every  sliee  subsequently  ereated  for  traffie 
destined  for  that  proeess.  Thus,  within  the  GINSU  resouree  hierarehy,  ultimately  for 
every  sliee  ereated  a  ginsu_task  objeet  is  a  parent  objeet  of  that  shoe.  When  and  if  the 
task  is  destroyed  by  the  operating  system,  as  the  task  objeet  is  released,  the  resouree 
framework  will  automatieally  and  effieiently  release  any  ehild  ginsu_slice  objeet,  and 
any  ehildren  of  those  sliees,  and  so  on,  preserving  endpoint  tear  down  semanties  upon 
applieation  exit.  The  ginsu  low  module,  therefore,  does  not  need  to,  and  does  not, 
explieitly  manage  this  proeess. 


struct  ginsu_task 
{ 

unsigned  long  magic; 
spinlock_t  lock; 
unsigned  long  flags; 
struct  task_struct  *  task; 

/*  slice  run  queue  */ 
TAILQ_HEAD (ginsu_slice)  rung; 
TAILQ_HEAD (ginsu_slice)  doneq; 
struct  ginsu_slice  *  root; 
ginsu_resource_t  r; 

}; 


/*  operational  mode  */ 

/*  corresponding  Linux  task  */ 

/*  slices  with  work  pending  */ 

/*  slices  with  work  completed  */ 

/*  root  slice  for  this  task  */ 

/*  pointer  to  resource  for  this  task  */ 


Figure  4:  The  GINSU  Task  Structure 
2.5.1  Accounting  Practices 

For  every  traffie  sliee  alloeated  for  traffie-sorting  or  resouree  management  bookkeeping, 
a  ginsu_slice  objeet  is  ereated  and  assoeiated  with  an  owner  proeess.  A  loeally  unique 
integer  identifier,  termed  a  sliee  identifier  (or  SID),  identifies  every  sliee.  There  are  also 
two  work  queues  into  whieh  units  of  network  protoeol  proeessing  work  may  be  plaeed  to 
be  servieed  at  a  later  time  -  one  for  paekets  reeeived  from  the  network,  and  one  for 
paekets  seheduled  by  the  owner  proeess  for  transmission.  The  ginsu  slice  objeet  also 
eontains  state  for  use  in  limiting  CPU  and  network  bandwidth  eonsumption  in  a 
hierarehieal  manner. 
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struct  ginsu_slice 

{ 


unsigned  long  magic; 
spinlock_t  lock; 
unsigned  long  flags; 
struct  ginsu_task  *  owner; 
int  sid; 

/*  work  queues  */ 
struct  { 

struct  sk_buff_head  q; 
atomic_t  count; 
atomic_t  avail; 
ginsu_sliceq_func_t  func; 

}  sliceq [2 ] ; 

/*  CPU  limit  bookkeeping  */ 
unsigned  long  ts_cap; 

unsigned  long  ts_cur_cap;  / '*' 

volatile  unsigned  long  *  ts_cap_p; 
volatile  unsigned  long  *  ts_cur_cap 
/*  linkage  */ 

TAILQ_ENTRY ( ginsu_slice ) 
ginsu_resource_t  r ; 
void  *  root_dir; 
void  *  sock_dir; 

/*  simple  (non-HTB)  shaping  bookkeep 
unsigned  long  max_rate;  / 

unsigned  long  last;  / 


runq_link; 

/* 

/* 

/* 


operational  mode  */ 
owner  GINSU  task  */ 
slice  identifier  */ 

network  work  queues  */ 

/*  sliceq[0]  is  RX  queue  */ 
/*  sliceq[l]  is  TX  queue  */ 


max  timeslice  portion  allowed  (1..100)  */ 
currently  consumed  timeslice  portion  */ 

/*  current  CPU  limit  in  force  */ 
p;  /*  current  use  for  limit  in  force  */ 


pointer  to  resource  for  this  slice  */ 
procfs  root  dirent  for  this  slice  */ 
procfs  socket  dirent  for  this  slice  */ 
ing  */ 

max  rate  (per  second)  */ 

rate  for  current  1-second  interval  */ 


Figure  5:  The  GINSU  Slice  Structure 

For  every  high-level  Linux  soeket  ereated,  a  eorresponding  ginsu_sock  object  is  created 
in  order  to  store  GINSU-specific  per-socket  state.  This  state  includes:  the  identifier  of  any 
DPF  filter  inserted  to  direct  packets  to  the  appropriate  active  endpoint;  the  identifier  of 
the  ingress  traffic-shaping  class  (if  any)  to  which  the  endpoint’s  traffic  will  belong;  and  a 
list  of  all  low-level  Linux  sockets  created  to  service  the  endpoint.  (There  may  be  more 
than  one  low-level  Linux  socket  created  over  the  lifetime  of  the  corresponding  high-level 
Linux  socket.)  Every  ginsu_sock  object  is  associated  with  an  owner  slice.  If  the  owning 
slice  is  destroyed,  because  of  implicit  actions  taken  by  the  resource  management 
framework,  all  child  ginsu_sock  objects  will  be  released  as  well.  Also,  during  GINSU 
socket  object  creation,  we  overwrite  a  method  pointer  in  the  Linux  kernel  socket  structure 
so  we  may  receive  notification  when  and  if  all  low-level  Linux  sockets  for  the  endpoint 
are  destroyed.  When  this  occurs,  the  corresponding  GINSU  state  is  released. 


23 


struct  ginsu_sock 


{ 

unsigned  long  magic; 
spinlock_t  lock; 

struct  ginsu_slice  *  owner;  /*  owner  GINSU  slice  */ 

ginsu_resource_t  r;  /*  pointer  to  resource  for  this  socket  */ 

unsigned  long  mark;  /*  value  to  mark  packets  with  for  ingress  QoS  */ 

int  dpf_fid;  /*  DPF  filter  identifier  */ 

int  nr_sk;  /*  count  of  associated  low-level  Linux  sockets  */ 

LIST_HEAD (ginsu_sock_el)  sk_list;  /*  list  of  associated  low-level  Linux  sks  */ 
void  *  proc_data;  /*  DPF  filter  text  for  ginsu_proc  */ 

/*  pointers  to  simple  (non-HTB)  shaping  bookkeeping  in  force  */ 
volatile  unsigned  long  *  max_rate_p; 
volatile  unsigned  long  *  last_p; 

}; 


Figure  6:  The  GINSU  Sock  Structure 
2.5.2  Resource  Constraint  Enforcement 

Resource  usage  is  tracked  in  various  ways  that  differ  for  the  major  resource  classes.  Flow 
bandwidth  monitoring  is  implicit  in  the  operation  of  the  HTB  component,  so  this  state  is 
distributed  among  the  installed  flow  class  configuration.  GINSU  also  provides  a  very 
simple  rate-limiting  scheme  that  may  be  used  in  lieu  of  HTB  queuing  for  LIMIT  targets 
only.  If  this  latter  scheme  is  used,  flow  bandwidth  monitoring  is  performed  using  state 
stored  in  ginsu_slice  structures.  The  per-slice  count  of  currently  connected  network 
flows  is  maintained  in  resource  attributes  within  each  slice  resource.  Likewise,  slice 
resources  are  annotated  with  any  connection  limits  installed  by  the  administrator.  At 
connection  establishment  time,  within  either  the  connect()  or  accept()  syscalls,  a  breadth- 
first  search  of  the  slice  resource  hierarchy  is  performed.  If  the  allowed  connection  count 
is  exceeded,  or  a  new  connection  would  prevent  a  reservation  from  being  serviced,  the 
new  connection  will  be  rejected.  CPU  timeslice  limits  are  maintained  within  the  resource 
data  for  each  slice.  When  child  slices  are  associated  with  a  slice  with  an  active  CPU  limit, 
pointers  within  those  children  are  updated  to  point  to  the  appropriate  fields  in  the  parent. 
During  LRP  processing,  these  pointers  are  followed  to  insure  that  the  aggregate 
processing  time  of  all  children  of  a  CPU-limited  slice  do  not  exceed  the  limit  in  force. 
Per-slice  socket  buffer  (SKB)  limits  are  enforced  whenever  incoming  work  is  to  be 
posted  to  a  work  queue.  If  an  in-force  queue  limit  would  be  exceeded,  the  incoming 
packet  is  instead  dropped.  Likewise,  if  the  size  of  the  incoming  packet,  when  added  to  the 
sum  of  the  sizes  of  all  queued  packets,  would  exceed  a  limit  on  maximum  allowed  buffer 
memory  for  a  slice,  the  incoming  packet  is  also  dropped. 

When  resource  limits  are  exceeded  or  when  resource  utilization  approaches  a  point  where 
load  must  be  shed  in  order  to  preserve  a  reservation,  GINSU  takes  automatic  action  to 
enforce  in-force  limits  and  reservations.  As  these  actions  are  undertaken,  messages  are 
sent  to  the  system  logging  facility  indicating  the  application(s),  endpoint(s),  and 
resource(s)  identified  as  the  cause  of  the  out-of-line  condition,  what  action  was  taken,  and 
the  result  of  that  action. 
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2.5.2. 1  Resource  Constraint  Violation  Logging 


When  resource  limits  are  exceeded,  or  when  current  system  conditions  require  proactive 
action  to  maintain  a  specified  resource  reservation,  GINSU  will  send  a  detailed  message 
to  the  system  logging  (syslog)  facility.  A  table  describing  the  format  of  these  messages 
and  presenting  an  example  follows: 


timestamp 

module 

facility 

Resource 

message  data 

Jun  30  14:21:26 

GINSU: 

SLICE: 

Socket  limit 

sid=31  now=100  limit=100 

The  following  table  enumerates  all  resources  managed  by  ginsu  low  for  which  messages 
may  appear  in  the  syslog  upon  an  exception: 


facility 

resource 

Exception 

SLICE 

sockets 

Slice  socket  limit  exceeded. 

“limit  exceeded:  sid^%d  euid^%d  egid^%d  limit^%d’’ 

connections 

Slice  connection  limit  exceeded  (includes  half-open). 

“limit  exceeded:  sid^%d  euid^%d  egid^%d  limit^%d’’ 

queue  length 

Slice  queue  entry  count  limit  exceeded.  Note:  Messages  for 
this  exception  will  be  rate-limited. 

“limit  exceeded:  sid^%d  euid^%d  egid^%d  limit^%d” 

TASK 

timeslice 

Timeslice  limit  in  lazy  receiver  processing  exceeded. 

“limit  exceeded:  pid^Vod  sid^%d  euid^%d  egid^%d 
limit^%d\%’’ 

SHAPING 

bandwidth 

Moderate  to  high  drop  rate  indicates  class  bandwidth  limit 

exceeded.  Note:  Messages  for  this  exception  will  be  rate- 
limited. 

“limit  exceeded:  sid^%d  euid^%d  egid^%d  limit^%d’’ 
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2.6  ginsu _proc 


The  ginsu_proc  module  provides  a  read-only  view  into  internal  GINSU  state  via  a  file  tree  in  the  Linux 
/proe  fdesystem. 

/proc/ net /ginsu/ 

I 

H - stat 

I 

H - slice/ 

I 

H - slice> 

I  I 

I  +--  stat 

I  I 

I  H - sock/ 

I 

I  H <1®*^  socket> 

I  H socket> 

I 

H - slice> 


The  top-level,  or  global,  ‘stat’  file  provides  a  read-only  listing  of  system-wide  state, 
eounters,  and  statisties.  Various  parts  of  GINSU  register  functions  on  the 
GINSU  HOOK  GET  PROC  STATS  hook.  When  the  user  requests  ‘staf  this  hook  is 
invoked  and  the  results  are  then  passed  to  userspace.  Below  is  an  example  listing  from 
the  global  ‘staf  file: 


modeflags:  SLICE  LRP 
slices:  13 
sockets:  56 
slice_immed:  78644 
slice_posted:  419857 
slice_lrp:  419857 
slice_dropped :  491 
slice_oom:  0 
cpu  overlimit:  0 
queue  overlimit:  0 
sock_overlimit :  0 
connect_overlimit :  0 
dpfd_rx:  0 
dpfd_inject:  0 
dpf_atoms:  59 
dpf_atoms_highwater :  332 
dpf_match_root :  2275 
dpf_match_leaf :  496221 
pkt_rx:  498496 
pkt_tx:  307652 


current  GINSU  operating  mode 

number  of  slices  active  in  system 

number  of  sockets  active  in  system 

packets  processed  immediately  (in  context) 

packets  deferred  into  slice  queues 

packets  deferred  for  lazy  receiver  processing 

packets  dropped  from  slice  queues 

packets  dropped  for  insufficient  buffer  space 

global  count  of  timeslice  over  limit  conditions 

global  count  of  slice  queue  over  limit  conditions 

global  count  of  socket  alloc  over  limit  conditions 

global  count  of  connection  over  limit  conditions 

packets  received  by  defragmenter 

packets  reinjected  by  defragmenter 

number  of  active  DPF  atoms 

max  number  of  DPF  atoms  active  at  one  time 

packets  matching  trie  root  (default) 

packets  matching  trie  leaf 

packets  received  by  host 

packets  transmitted  by  host 
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pk;t_should_lrp ;  0 
pk;t_mark;ed:  193277 
pk;t_dropped:  1159 
pk;t_stolen:  419857 
pk;t_replaced:  4732 
pk;t_bandwidth_overlimit :  0 


packets  received  with  LRP  disabled 
packets  marked  by  ingress  traffic-shaping 
packets  dropped  by  ingress  traffic-shaping 
packets  “stolen  ’’from  Linux  by  GINSU 
packets  replaced  by  ingress  traffic-shaping 
global  count  of  bandwidth  over  limit  conditions 


The  per-slice  ‘stat’  file  provides  statisties  and  state  information  regarding  a  particular 
slice,  as  well  as  the  values  of  any  counts  or  limits  that  may  be  present  as  an  attribute  in 
the  slice’s  resource  structure.  Below  is  an  example  listing  from  a  per-slice  ‘stat’  file: 


/proc/net/ ginsu/  slice/7/stat: 

id:  7 

owner;  622  (xinetd) 
flags :  ON_RUNQ 
sliceq[RX];  count  3  avail  8 
sliceq[TX]:  count  0  avail  8 
nr_sock;ets:  1 
nr  connections:  1 


slice  identifier 

owner  process  PID  and  name 
current  slice  operating  mode 
receive  slice  queue  length  and  limit 
transmit  slice  queue  length  and  limit 
number  of  sockets  influenced  by  this  slice 
number  of  connections  influenced  by  this  slice 


Below  is  an  example  listing  from  a  per-socket  file: 
/proc/net/gmsu/slice/7/sock/c0ab88de: 


owner :  7 
fid:  9 

mark:  0x0000 
filter;  [ 

m[14;8]  &  OxfO  ==  0x40  && 


owner  slice  identifier 
DPF filter  identifier 
ingress  traffic  queueing  mark 
DPF  filter 


m[20:16]  &  0xff3f  ==  0x0  && 

m[23:8]  &  Oxff  ==  0x6  && 

m[30:32]  &  Oxffffffff  ==  0xl00007f  && 

m[36:16]  &  Oxffff  ==  0x180 


nr_sk ;  1  number  of  associated  low-level  Linux  sockets 

socket:  sk=cf9e9460  [  low-level  Linux  socket  State 

family:  2 
type:  1 
protocol:  6 
refcnt:  1 
rcvbuf:  87380 
sndbuf:  16384 
rmem_alloc:  0 
wmem_alloc:  0 
wmem_queued:  0 
receive_queue  len:  0 
write_queue  len:  0 

1 
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3.  Demonstration 


Our  demonstration  illustrates  a  typieal  applieation  of  a  WebShield-ltke  deviee  as  a 
boundary  security  gateway  with  a  network-accessible  management  interface.  We  show 
how  such  a  device  can  be  used  to  protect  a  subnet  by  inspecting  web  and  messaging 
traffic,  but  must  often  be  “over  engineered”  (provisioned  far  in  excess  of  typical  capacity) 
to  guarantee  service  levels.  For  instance,  the  demonstration  shows  that  a  gateway  device 
can  not  easily  ensure  service  levels  to  satisfy  both  client  throughput  and  management 
interaction  while  under  heavy  loads. 

The  benefit  of  the  GINSU  processing,  as  demonstrated,  reveals  that  for  a  given  level  of 
hostile  (or  unwarranted)  traffic,  the  same  hardware  can  appear  both  more  responsive  and 
more  tolerant  of  load  spikes.  Additionally  we  demonstrate  how,  without  the  GINSU  slice 
isolation  features,  hostile  traffic  can  adversely  affect  traffic  through  the  gateway  and 
interfere  with  a  protected  client’s  use  of  the  network.  GINSU,  through  its  use  of  “Lazy 
Receiver  Processing”  and  “Per-Slice  Queues”  is  able  to  efficiently  and  effectively  shed 
excess  traffic  based  on  administrator-applied  limits  and  reservations,  before  that  traffic  is 
able  to  consume  sparse  resources  on  the  gateway  device 


4.  Future  Work 

The  Guaranteed  Internet  Stack  Utilization  (GINSU)  project  comes  to  the  FTN  program 
via  the  ATIAS  Survivable  Wired  and  Wireless  Infrastructure  for  the  Military  (SWWIM) 
Focused  Research  Topic.  GINSU  seeks  to  guarantee  network  accessibility  by  an  end- 
host,  even  in  the  event  of  an  attempted  denial  of  service  attack.  To  provide  this  guarantee 
of  accessibility,  we  augment  an  existing  operating  system’s  network  stack  and  kernel 
with  fine-grained  resource  monitoring.  We  provide  mechanisms  for  reserving  and/or 
limiting  scarce  resources  based  on  the  ultimate  consumer  of  those  resources.  We  enable 
attribution  of  resources  based  on  a  rich  collection  of  packet,  user,  and  process  attributes. 
The  combination  of  source  partitioning  and  attribution  gives  administrators  considerable 
flexibility  and  power  in  determining  how  his  system  is  to  be  used. 

GINSU  technology  has  been  proposed  in  three  upcoming  research  projects.  We  feel  that 
GINSU  brings  a  powerful  policy  enforcement  and  resource  tracking  mechanism  to  these 
projects,  and  the  integration  of  GINSU  into  larger,  enterprise-scale  systems  adds  a 
number  of  useful  extensions  to  the  GINSU  feature  set. 
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